Sparsity promoting regularizers are widely used to impose low-complexity structure (e.g. l1-norm for sparsity) to the regression coefficients of supervised learning. In the realm of deterministic optimization, the sequence generated by iterative algorithms (such as proximal gradient descent) exhibit "finite activity identification", namely, they can identify the low-complexity structure in a finite number of iterations. However, most online algorithms (such as proximal stochastic gradient descent) do not have the property owing to the vanishing step-size and non-vanishing variance. In this paper, by combining with a screening rule, we show how to eliminate useless features of the iterates generated by online algorithms, and thereby enforce finite activity identification. One consequence is that when combined with any convergent online algorithm, sparsity properties imposed by the regularizer can be exploited for computational gains. Numerically, significant acceleration can be obtained.
translated by 谷歌翻译
Relation extraction (RE), which has relied on structurally annotated corpora for model training, has been particularly challenging in low-resource scenarios and domains. Recent literature has tackled low-resource RE by self-supervised learning, where the solution involves pretraining the relation embedding by RE-based objective and finetuning on labeled data by classification-based objective. However, a critical challenge to this approach is the gap in objectives, which prevents the RE model from fully utilizing the knowledge in pretrained representations. In this paper, we aim at bridging the gap and propose to pretrain and finetune the RE model using consistent objectives of contrastive learning. Since in this kind of representation learning paradigm, one relation may easily form multiple clusters in the representation space, we further propose a multi-center contrastive loss that allows one relation to form multiple clusters to better align with pretraining. Experiments on two document-level RE datasets, BioRED and Re-DocRED, demonstrate the effectiveness of our method. Particularly, when using 1% end-task training data, our method outperforms PLM-based RE classifier by 10.5% and 5.8% on the two datasets, respectively.
translated by 谷歌翻译
When a human communicates with a machine using natural language on the web and online, how can it understand the human's intention and semantic context of their talk? This is an important AI task as it enables the machine to construct a sensible answer or perform a useful action for the human. Meaning is represented at the sentence level, identification of which is known as intent detection, and at the word level, a labelling task called slot filling. This dual-level joint task requires innovative thinking about natural language and deep learning network design, and as a result, many approaches and models have been proposed and applied. This tutorial will discuss how the joint task is set up and introduce Spoken Language Understanding/Natural Language Understanding (SLU/NLU) with Deep Learning techniques. We will cover the datasets, experiments and metrics used in the field. We will describe how the machine uses the latest NLP and Deep Learning techniques to address the joint task, including recurrent and attention-based Transformer networks and pre-trained models (e.g. BERT). We will then look in detail at a network that allows the two levels of the task, intent classification and slot filling, to interact to boost performance explicitly. We will do a code demonstration of a Python notebook for this model and attendees will have an opportunity to watch coding demo tasks on this joint NLU to further their understanding.
translated by 谷歌翻译
Most TextVQA approaches focus on the integration of objects, scene texts and question words by a simple transformer encoder. But this fails to capture the semantic relations between different modalities. The paper proposes a Scene Graph based co-Attention Network (SceneGATE) for TextVQA, which reveals the semantic relations among the objects, Optical Character Recognition (OCR) tokens and the question words. It is achieved by a TextVQA-based scene graph that discovers the underlying semantics of an image. We created a guided-attention module to capture the intra-modal interplay between the language and the vision as a guidance for inter-modal interactions. To make explicit teaching of the relations between the two modalities, we proposed and integrated two attention modules, namely a scene graph-based semantic relation-aware attention and a positional relation-aware attention. We conducted extensive experiments on two benchmark datasets, Text-VQA and ST-VQA. It is shown that our SceneGATE method outperformed existing ones because of the scene graph and its attention modules.
translated by 谷歌翻译
Scene Graph Generation (SGG) serves a comprehensive representation of the images for human understanding as well as visual understanding tasks. Due to the long tail bias problem of the object and predicate labels in the available annotated data, the scene graph generated from current methodologies can be biased toward common, non-informative relationship labels. Relationship can sometimes be non-mutually exclusive, which can be described from multiple perspectives like geometrical relationships or semantic relationships, making it even more challenging to predict the most suitable relationship label. In this work, we proposed the SG-Shuffle pipeline for scene graph generation with 3 components: 1) Parallel Transformer Encoder, which learns to predict object relationships in a more exclusive manner by grouping relationship labels into groups of similar purpose; 2) Shuffle Transformer, which learns to select the final relationship labels from the category-specific feature generated in the previous step; and 3) Weighted CE loss, used to alleviate the training bias caused by the imbalanced dataset.
translated by 谷歌翻译
协作过滤问题通常是基于矩阵完成技术来解决的,该技术恢复了用户项目交互矩阵的缺失值。在矩阵中,额定位置专门表示给定的用户和额定值。以前的矩阵完成技术倾向于忽略矩阵中每个元素(用户,项目和评分)的位置,但主要关注用户和项目之间的语义相似性,以预测矩阵中缺少的值。本文提出了一种新颖的位置增强的用户/项目表示培训模型,用于推荐,Super-Rec。我们首先使用相对位置评级编码并存储位置增强的额定信息及其用户项目与嵌入的固定尺寸,而不会受矩阵大小影响。然后,我们将受过训练的位置增强用户和项目表示形式应用于最简单的传统机器学习模型,以突出我们表示模型的纯粹新颖性。我们对建议域中的位置增强项目表示形式进行了首次正式介绍和定量分析,并对我们的Super-Rec进行了原则性的讨论,以表现优于典型的协作过滤推荐任务,并具有明确的和隐式反馈。
translated by 谷歌翻译
基于文本的游戏(TBG)是复杂的环境,允许用户或计算机代理进行文本交互并实现游戏目标。为基于文本的游戏构建面向目标的计算机代理是一项挑战,尤其是当我们使用逐步反馈作为模型的唯一文本输入时。此外,代理商很难通过从更大的文本输入空间中评估灵活的长度和形式。在本文中,我们对应用于基于文本的游戏字段的深度学习方法进行了广泛的分析。
translated by 谷歌翻译
我们为指定实体识别(NER)提出了一个有效的双重编码框架,该框架将对比度学习用于映射候选文本跨度,并将实体类型映射到同一矢量表示空间中。先前的工作主要将NER作为序列标记或跨度分类。相反,我们将NER视为一个度量学习问题,它最大程度地提高了实体提及的向量表示之间的相似性及其类型。这使得易于处理嵌套和平坦的ner,并且可以更好地利用嘈杂的自我诉讼信号。 NER对本双重编码器制定的主要挑战在于将非实体跨度与实体提及分开。我们没有明确标记所有非实体跨度为外部(O)与大多数先前方法相同的类别(O),而是引入了一种新型的动态阈值损失,这与标准的对比度损失一起学习。实验表明,我们的方法在受到监督和远处有监督的设置中的表现良好(例如,Genia,NCBI,BC5CDR,JNLPBA)。
translated by 谷歌翻译
在将文档解析为下游应用程序的结构化,机器可读格式时,识别非结构化数字文档的布局至关重要。文档布局分析中的最新研究通常依靠计算机视觉模型来理解文档,同时忽略其他信息,例如上下文信息或文档组件的关系,这对于捕获至关重要。我们的DOC-GCN提出了一种有效的方法,可以协调和整合异质方面以进行文档布局分析。我们首先构造图形以明确描述四个主要方面,包括句法,语义,密度和外观/视觉信息。然后,我们应用图形卷积网络来表示信息的各个方面,并使用池进行集成。最后,我们将各个方面汇总,并将它们送入2层MLP,以进行文档布局组件分类。我们的DOC-GCN实现了新的最先进的结果,从而获得了三个广泛使用的DLA数据集。
translated by 谷歌翻译
注意机制已被用作跨视觉和语言(VL)任务的重要组成部分,以弥合视觉和文本特征之间的语义差距。尽管注意力已被广泛用于VL任务,但尚未研究其在弥合视觉和文本线索之间语义差距方面的不同注意对准计算的能力。在这项研究中,我们通过研究注意力评分计算方法,并检查其实际代表视觉区域的作用以及文本令牌对全球评估的重要性,对了解注意力对齐的作用进行全面分析。我们还分析了注意力分数计算机制的条件更多(或更少)可解释,并且可能会影响三个不同VL任务的模型性能,包括视觉问题答案,文本到图像生成,文本和图像匹配(句子和图像检索)。我们的分析是同类产品中的第一个,并提供了在VL任务的训练阶段应用的每个注意力对齐得分计算的重要性,这通常在基于注意力的交叉模态模型和/或预审前的模型中被忽略。
translated by 谷歌翻译